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A contact map is a simple representation of the structure of proteins and other chain-like macro- 
molecules. This representation is quite amenable to numerical studies of folding. We show that 
the number of contact maps corresponding to the possible configurations of a polypeptide chain of 
N amino acids, represented by (N — l)-step self avoiding walks on a lattice, grows exponentially 
with TV for all dimensions D > 1. We carry out exact enumerations in D = 2 on the square and 
triangular lattices for walks of up to 20 steps and investigate various statistical properties of contact 
maps corresponding to such walks.. We also study the exact statistics of contact maps generated 
by walks on a ladder. 

PACS numbers: 87.15.By, 87.10.+e, 5.50.+q 

I. INTRODUCTION 

Prediction of a protein's structure from its amino acid sequence is an important and challenging open 
problem. The first choice one has to make when approaching the problem is that of structure representation. 
One of the most minimalist representations of a protein's structure is in terms of its contact map [Qj2| which, 
for a polypeptide chain of length N — 1, is an N x N matrix S. Denoting by i,j the position index of the 
amino acids along the chain, the elements of S are defined as 

„ J 1 if amino acids i and j are in contact 

y — [ otherwise ^ ' 

"Contact" can be defined in various ways: for example ||, one can set Sij — 1 when there exist two heavy 
(all but hydrogen) atoms, one from amino acid i and one from j, separated by less than some threshold 
distance. Contact maps are independent of the coordinate frame used and for compact structures, such as 
the native state of proteins, with many contacts, it is relatively easy to go from a map to a set of possible 
structures to which it may correspond J^,^,||. On a lattice, a protein conformation, or fold, is represented as 
a self-avoiding random walk (SAW) . A site on the lattice visited by the walk represents an amino acid. 
Two sites of the SAW are in contact if they are nearest neighbors and they are non-consecutive along the 
walk. 

To search for a protein's native structure in the space of contact maps (as has been proposed by several 
groups), it is important to have general knowledge about the size and nature of this space. Recent studies 
|^,^| of the dynamics of naturally occurring proteins has shown that the contact maps along with simple 
energetics is enough information to reproduce the vibrational spectrum with some accuracy. This makes it 
important to understand the statistics of the contact map representation. In particular, one would like to 
know how the number of different physical contact maps depends on the chain length N. Clearly one has 
2iV(JV+i)/2 distinct N x N symmetric matrices of binary elements Sij = 0, 1. Most of these, however, do not 
correspond to physical structures; these matrices cannot be realized as contact maps of real, physical chains 
or SAW's. 

In fact, as we shall see, Nm, the total number of physical maps obtainable for a chain of length N on a 
lattice satisfies the bounds 

e CeN < N M < e c » N . (2) 

The upper bound in (|J) follows trivially from the bound on N$aw, the total number of SAW's, which is 
asymptotically given by M 
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N M < Nsaw ~ N J ~ X fj, N ~ e c - N , c u = ln M , (3) 

where /x is the connectivity constant of the lattice, and 7 is a critical exponent. 

A simple construction of a special set of walks, each with a distinct contact map provides the lower bound. 
Start the chain at the the origin, i = 1; the first step and all odd-indexed steps are in the positive horizontal 
direction +x, whereas every even indexed step is in the vertical direction, either +y or —y. The decision 
taken for step 2i either brings the site 2i + 1 into contact with 2i — 2, in which case £24-2. 2i+i = U or this 
contact is absent and S , 2i-2,2i+i = 0. Hence for every choice of the set of vertical steps we get a walk whose 
contact map does not appear for any different walk from the set. Clearly, the maps constructed this way 
must have S2i-k,2i+i — for k > 1. In this way we obtain N' SAW SAW's and the following exponential lower 
bound for the number of contact maps 

N M > N' SAW ~ 2 N ' 2 ~ e c ' N . (4) 

Clearly the argument works for any dimension and can be extended for the triangular lattice. This (rather 
poor) bound can be improved by including walks whose maps can have other non-vanishing elements, e.g. 
with S2i-A.2i+i = 1. A better lower bound for the square lattice is obtained by an explicit construction given 
in Section 3. 

We actually expect that Iii(Nm)/N approaches a limit, 

— iV-->« ( 5 ) 

as N becomes large (the existence of a limit does not follow directly from (||)). To estimate a we computed, 
for N < 20, the precise numbers Nm on the square and triangular lattices. This is done by exact enumeration 
of all possible distinct SAW's, i.e. not related by symmetry operations of the lattice, and the corresponding 
contact maps. 

Using this enumeration for low N and sampling for larger N, we also computed various other statistics of 
contact maps, such as the number of maps with particular density of contacts, the number of SAW's that 
correspond to this set of maps, etc. We also obtained explicit results about the corresponding quantities for 
walks on a special "ladder" lattice. 



II. EXACT ENUMERATION IN D = 2 

In the upper curve of Fig. [I] we plot the number of walks Nsaw, obtained by complete enumeration [ pT[ , 
versus N, fitted (for N < 25) with the known || estimates fi = 2.6381585 for the connective constant and 
7 = 43/32 for the critical exponent on the 2D square lattice. The lower curve is the total number Nm of 
contact maps, corresponding to all possible SAW's with N < 20 on a 2D square lattice. Fitting Eq.(|[), we 
obtain a = 0.83(1). This result was obtained previously, by enumerating walks with N < 14, by Chan and 
Dill |l0) . For comparison, we note that a straightforward fit of Nsaw with Eq. (||) gives the upper bound 
prefactor c u = 1.00(1), and that the lower bound prefactor, as from Eq.(Q), is ct — 0.346. We obtained the 
corresponding results for the triangular lattice. But in this case due to the higher density of contacts, we 
were able to obtain results only for N < 11 as shown in Fig. |]. Our fit gave c u = 1.47(1) and a — 1.28(1). 
To address the question whether in D = 2 the constant a for the contact maps is strictly less than ln(/Lt) , 
we present in Fig. ^| a plot of the running value of the connective constant /x for the walks and the running 
value of exp (a) for the maps as the size of the walk increases. The running slope fJ-(N) is computed from 
enumeration data using the standard formula 

In M (A0 - \ \^N S aw{N + 1) - lnN S Aw(N - 1)] , (6) 

and an analogous one for the factor a for maps. 
This figure is consistent with a < In jj,. 
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FIG. 1. Upper curve: N$aw, the number of SAW's versus their length N, obtained by exact enumeration for 
N < 25 on the 2D square lattice and fitted with Eq. (Q). The lower curve shows the exponential variation of Nm, 
the number of contact maps corresponding to all possible SAW's with N < 20. Data were obtained from complete 
enumeration. 
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FIG. 2. A comparison of the connective constant /x for the walks and the exponential growth factor exp (a) for the 
contact maps generated on a square lattice evolving with the size of the walk. Horizontal lines are the known value 
H — 2.638 M and our estimate e a = 2.3, based on the data for all N's. 



Most biologically functional proteins fold into remarkably compact conformations, with very few solvent 
molecules in the interior. Therefore it is of interest to consider how the number of contact maps and their 
corresponding walks varies with the number of contacts. 

Denote by Nsaw{c) the number of walks with a fixed number Nc of contacts. When there is an interaction 
energy u associated with each contact, then \ti(Nsaw) is identical to the entropy of the chain at energy 
E — Ncu. In Fig. || we plot the fractions 
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n S Aw(c) = Id.(Nsaw(c)/N) 
for chains of different lengths N on the 2D square lattice. 



(7) 
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FIG. 3. Logarithm nsAw(c) of the fraction of walks with a given fraction c of contacts. On the square lattice, we 
show data obtained from exact enumeration for chain lengths N = 14, 16, 18, 20, and data obtained from sampling 
for N = 64, 128, 256. For clarity, errors on data from sampling are shown only for N = 64. On the triangular lattice, 
we show the fraction nsAw(c) of walks with a given number Nc of contact for N — 9, 10, 11. 

The time required to enumerate walks and maps increases exponentially with the size N and it becomes 
impractical to use this method. However we want to generate the statistics for larger values of N, which 
is the actual physical situation. Standard techniques are routinely used |)j to generate unbiased samples 
of SAWs on the lattice. We use the method of incomplete enumeration (Redner-Reynolds) to generate our 
sample of unbiased SAWs. 

We use our sample to generate the distribution of the fraction of the walks with a given number of contacts 
nsAw(Nc) as introduced before for SAWs of length N=64, 128,256. In Fig. [| we plot the result. 
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FIG. 4. The collapse of distributions(for three different lengths) for the fraction of walks with a given number of 
contacts after rescaling the finite-size variables. 

One would like to say something about how this distribution looks in the asymptotic limit. We try to 
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analyze this by rescaling the finite-size variables such that the distributions collapse on top of each other. 
If Fig. [I], we observe that normalizing the variance to 1 and the mean to 0, results in the collapse of the 
distributions (for the three different lengths of N=64, 128,256). We compare this to the normalized Gaussian. 
From the data obtained, it appears that we cannot rule out either possibility (Gaussian or non-Gaussian). 
We also list the kurtosis values K obtained for the different data sets. For a Gaussian distribution, we expect 
an exact value of 3.00. 

It is however not clear how one should generate the distribution of the maps with a given number of 
contacts Nm(Nc). While we have standard and efficient algorithms to generate SAWs with the desired 
weight, it seems difficult to generate contact maps which are equally weighted in the sample and not biased 
with their degeneracies. 

Let now Nm{c) be the number of distinct contact maps with Nc contacts. We show in Fig. || how the 
fractions 

n M (c) = In (N M {c)/N) (8) 
vary with c, again for various chain lengths. 
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FIG. 5. Logarithm tim{c) of the fraction of contact maps with a given fraction c of contacts, shown for 4 different 
walk lengths on the 2D square lattice, and for 3 different walk lengths on the triangular lattice. 



The main difference between Figs. || and is that the distribution of walks has its maximum at smaller 
values of c than the distribution of contact maps. This can be understood intuitively by a noting that for 
small c the number of maps is suppressed in comparison to N$aw{c): for example, there is only a single 
contact map with c = 0, whereas there are many walks with no contacts. In general, the degeneracy of 
contact maps has a non-trivial dependence on the number of their contacts. 

Consider walks of length N and denote by 

G = e Ng (9) 

the degeneracy of a map, i.e. the number of walks corresponding to that contact map. For each value of 
g we determined H(g), the number of maps whose degeneracy is e Ng . This information is shown in Fig. ^ 
where we present h(g) = lnH(g)/N versus g, for walks of length N — 20 on the square lattice. We further 
analyze the degeneracy by concentrating on the subset of maps with a fixed number of contacts, Nc. In Fig. 
U we show results for Nc — 3, 4, 5, 6, 9, i.e. c = 0.15, 0.2, 0.25, 0.3, 0.45. Not surprisingly, the maps with large 
number of contacts which correspond to the typical native folds of proteins generally have small degeneracy. 
It is the maps with few contacts which account for the large degeneracy. In general the map with c = 
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(all zero entries in the matrix) has G > 2 N corresponding to all the directed walks with no contacts. The 
walks that correspond to maps with different degeneracies differ in the lengths of contact-free segments that 
the walk has. For N — 20 and Nc = 6 on the square lattice, we measured the length L of the longest 
contact-free stretch at the ends of the walk. Maps with low degeneracy have, on the average, L ~ 1, whereas 
for highly degenerate maps we found, typically, L ~ 7 (there are also highly degenerate maps and walks 
with long contact-free stretches far from the ends). Clearly, the presence of long stretches free of contacts is 
responsible for the high degeneracy of a map. 
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FIG. 6. Histogram of h(g) — \nH(g)/N, where H(g) is the number of maps with degeneracy G = e Ng , for walks 
of length N = 20 on the square lattice. Separate curves are shown for subset of maps with c = 0.15, 0.2, 0.25, 0.3, 0.45. 

Let now G(Nc) denote the average degeneracy over all the maps with Nc contacts. We studied G(Nc) 
as a function of Nc. As already mentioned, contact maps corresponding to maximally compact walks have, 
on the average, a very small degeneracy. It seems reasonable to assume that for a fixed c = ^, G(Nc) will 
grow exponentially with N, such that 

In G(N, Nc) = aNf (c) . (10) 

The enumeration results seem to support this assumption as seen in the collapse plot in Fig. ^ with a = 
0.86 for the square lattice and a = 1.07 for the triangular lattice. The value of a is extracted by fitting 
G(N, 0) ~ e aN . As we can see the assumption Eq. (|Io| ) seems to hold to good accuracy. 
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FIG. 7. Scaling plot of the degeneracy function G(c) averaged over all the contact maps with Nc contacts, plotted 
for different chain lengths N on the triangular and square lattices. 



III. EXACT RESULTS FOR WALKS AND MAPS ON A LADDER 



In this section, we introduce and solve the problem exactly on a toy lattice. The lattice is a ladder of two 

rows of sites, at points (x, y) with y — 0, 1 and x = 0, 1, 2, We consider all walks starting at the origin 

with horizontal steps in the positive x direction. We first show that the numbers of SAWs and contact maps 
is exponential in N, with different coefficients a. Denote by A(N) the number of walks of N steps; 

A(N) = A h (N) + A v {N) 

where Ah(N) is the number of walks that end with a horizontal step and A V (N) walks end with a vertical 
step. Since a vertical step must be preceded by a horizontal one we have 

A V (N) = A h (N-l) 

On the other hand, to every walk one can add a horizontal step so that 

A h (N) = A(N-l) 

Thus we get, using these three relationships, the recursion for the Fibonacci numbers: 

A(N) = A(N - 1) + A(N - 2) (11) 
and hence the number of walks grows, for large N, exponentially 

1 + V5 



A(N) 



ln- 



0.481 



(12) 



A recursion for the number of contact maps can be calculated as well. One way to do this is by representing 
B(N), the total number of distinct contact maps of N steps as a sum 

B(N) = BoiN'j + B^N) 

where Bq(N) is the number of contact matrices (maps) whose first row contains only zeroes (i.e. the first 
site does not have a contact); Bi(N) is the number of those maps for which the first site does have a contact. 
Since to every map we can add a first row (and column) of zeroes, we have 
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B {N) = B(N-l) 



For all maps that start with a contact, the first four steps are fixed; the corresponding walks can be continued 
in two different ways, either with a vertical step or with a horizontal one. These two possibilities give rise 
to a recursion of the form 



Bi(N) = B X {N - 2) + B(N - 5) 
With a little algebra the last three equations yield the final recursion 

B(N) = B(N - 1) + B(N - 2) - B{N - 3) + B{N - 5) (13) 

If we now assume that 

HgW) a m 

N 

as N becomes large, we find that e Qm is the solution of the equation 

q 5 - q 4 - q 3 + q 2 - 1 = 



which yields 

"■jit ss 0.367 < a„ 



Having counted the number of walks and maps, we turn to calculate various statistical features of maps and 
walks on a ladder. For example, we can consider the fraction of maps with a given number of contacts; the 
degeneracy of maps, i.e. the number of different walks that have the same map, etc. Analytical examination 
of such quantities sheds light on the origin of results obtained from exact enumeration of walks in two 
dimensions and indicate the extent to which the relatively short chains that can be enumerated represent 
the true two dimensional behavior. 

A walk of N steps taken according to the rules given above can be characterized by the sequence of the 
contact-free intervals between all pairs of consecutive contacts. We denote by m the number of steps needed 
to walk from the end site of contact n — 1 to the end site of contact n. Let D(m) denote the degeneracy of 
such a contact- free walk, i.e. the number of different SAWs of length m. To calculate D(m), we introduce a 
transition matrix L, among six possible "states", of pairs of consecutive steps, referred to as "2-steps". The 
six possible 2-steps that can occur on a ladder are shown in Fig. ||. 

1 2 3 4 5 6 



FIG. 8. The six possible two successive steps on a ladder. 



The fact that L42 = 1 shows that it is possible to have a 3-step walk whose first and second steps 
constitute a 2-step is of type 2 and the second and third step constitute a 2-step of type 4. Note that only 
those transitions that do not terminate the walk (i.e. do not generate a contact) are designated as possible 
by the matrix L - for example we have L34 = since a 4 followed by a 3 generates a contact. 



L 








1 0\ 



/ 1 

1 

1 

1 

1 



1 



(14) 



V 1 1 / 
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Note that to have a contact by adding 2-step n, the 2-step n— 1 must be either of type 4 or 5; the corresponding 
vectors are V4 = (000100) and V5 = (000010). The degeneracy of walks of length to in between contacts is 
then given by 

D(m) = [(Vif + {V s ) T ]L m -^V 2 or [(V 4 ) T + (V^L^Vs (15) 

The possible lengths for m are 2, 5, 6, 7, and the corresponding degeneracies are given 

1,1,1,1,2,3,4,6,9,13,19,28,41,60.... Note that D(100) ~ 10 16 and asymptotically 

D(m) oc (1.465)" 1 = e°' 382m (16) 

where 1.465 is the largest eigenvalue of the matrix L. 

An N x N contact map is completely specified by the set of inter-contact intervals {to}. If for a given 
map an interval of length m appears A (to) times, denote 

P(m) = N(m)/N 

The logarithm of the number of SAW associated with this particular map is then given by 

In N SAW ({to}) = A^P(to) ln D(m) (17) 

m 

The number of contacts of this map is given by 

A c ({to}) = J^A(to) = A^P(to)=Ac (18) 

m m 

where the number of contacts per step, 

C = ]Tp(to) = ^ (19) 

m 

The normalization of the P(m) is 

^P(to)to = 1 (20) 

m 

The number of maps, Nm, characterized by the same set of fractions {P(m)} (with different orderings of 
the contact-free intervals) is 

lniV M = -A^P(m)lnP(m) + A^clnc (21) 

m 

and therefore the number of SAW, Nw, associated with all maps characterized by the same fractions {P(to)} 
is 

]ilNsaw (P(m)) = -A^|^P(m)[lnP(m) -lnP(m)] -clncj (22) 

The interplay between these two terms is clear. As the distance to between contacts increases, the number 
of SAW corresponding to such a map increases exponentially, but at the same time the number of contacts 
in the map decreases and the number of such maps (permutation of the distances) decreases exponentially. 
Some limiting cases can be analyzed as follows. For the case densest with contacts, i.e. c = 0.5, there are 
only two possible maps and hence ln(NsA\v)/N — > 0. On the other hand, for maps with O(l) contacts, 
and hence c — > 0, to scales with N and D(N) oc e °- 382JV , and therefore Nsaw ~ e 382Ar . Since in both 
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limiting cases \h(Nsaw) does not scale as 0.481AT (see eq. (12)), the quantity Iu(Nsaw) is expected to have 
a maximum at some intermediate value of c. 

The number of SAW associated with maps that have Nc contacts can be studied analytically; 



N S aw(c) = / 7TdP(m)S 
Jo 



^mP(m) - 1 



£P(m)- 



-N{J2 m p i m ) P n P{m)-\n D(m)]-c In c} 



(23) 



The integrals are evaluated by the saddle point method; the resulting equations can be reduced to the 
following coupled equations for P(2) and P(5) 



l = P(2)]TP(m)[P(5)/P(2)] 



(m-2)/3. 



c=P(2)^D(m)[P(5)/P(2)] 



(m-2)/3 



(24) 



where for every allowed m — 2,5,6,7,...., the degeneracy Dim) is determined by eq. (|l5|); these are 
supplemented by 



P(m) = P(2)L>(m)[P(5)/P(2)] 



(m-2)/3 



(25) 



The numerical solution of the saddle point equations gives -h In N$aw as a function of c, the concentration of 
contacts, is presented in Fig. ||. The maximum \sx{Nsaw) = 0.481, as expected, is obtained for c ~ 0.105. 
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FIG. 9. nsAw = In (Nsaw /N) versus c, for maps of Nc contacts 



One can calculate \u(Nm) as a function of c in a similar fashion. All one has to do is to set D(m) = 1 in eq. 
(p3|); the resulting saddle point equations are obtained from eq. ( pif - p5| ), by using there, again, D(m) = 1. 

The numerical solution for hi(iVM) as a function of c, with the trivial end points (0, 0) and (0.5, 0), are 
presented in Fig. nffl. 
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FIG. 10. n,M = In (Nm/N) versus c; Nc is the number of contacts 

The final property of walks and maps on a ladder that we calculate deals with the degeneracy of a map 
with Nc contacts. Denote by G — e Ng the number of walks that have the same map and by H(g, c) the 
number of maps of Nc contacts and this value of the degeneracy. The quantity H(g, c) is given by 



H(g,c) = [ ndP(m)5 
Jo 



^P(m) In D(m) 



mP(m) — 1 



5>(m) 



and the saddle point equations for {P(m)} are 
P( m ) = P(2)D(m) 



l_ 




L m 


)} are 






\P(2)P(8)- 


[ln£>(m)/ln 2] 


fP(5)\ 


P(5) 2 







P(m)—c In c 

(26) 



(m-2)/3 



(27) 



The three unknown fractions P(2), P(5), P(8) are determined through the three global constraints 
c = P(m); 1 = y ]P(m)m, g = P(m) In P(r 



m 



A typical result for c = 0.2 are presented in Fig. 11 
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FIG. 11. Histogram of h(g,c) = \nH(g,c)/N versus g, for c = 0.2 on a ladder. 
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FIG. 12. Plot of g versus c on a ladder. 



IV. SEMIDIRECTED RESTRICTED WALKS 



A related problem is that of semidirected restricted walks (SRW) on a square lattice. These walks are 
defined as follows: all horizontal steps are directed - in the +x direction. Vertical steps are restricted so that 
the number of consecutive vertical steps never exceeds k. The k=l case is already a superset of walks on a 
ladder. 

The number of SRWs can be computed as follows. 

Denote the total number of walks by A(N). As before, Af l (N) of these walks end with a horizontal step 
and A V (N) walks end with a vertical step. 
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A(N) = A h (N)+A v (N) 



A V (N) can be further classified into k classes. A % V (N) corresponds to walks that end with exactly i vertical 
steps. 

A V (N) = A\(N) +A\(N) + --- + A\(N) 
A i v (N)=A( i - 1 \(N-l) 
A\{N) = 2A h {N -1) 

A little algebra gives the following recursive relation 

A(N) = A(N - 1) + 2 (A(N - 2) + • • • + A(N - k - 1)) 

So the connective constant ( of exponential growth) is given by the root of the following polynomial 
equation: 

(v i)y k = 2^ 
i - y 

For k = 1 this reduces to (y — l)y = 2, i.e. y = 2, whereas in the k — > oo limit it simplifies to (y — l) 2 = 2 
so that the connective constant increases to y = 1 + \/2 ~ 2.42. 

Computing the number of contact matrices for a general k seems slightly more tedious, but it is possible 
to do it explicitly for k = 1. We denote B(N) by the number of distinct maps of size N. It can be classified 
into maps with either one contact or no contact in the first row. The number of the former is Bo(N) and 
the latter B^N). 

B(N) =B (N) + B 1 (N) 
B (N) = B(N - 1) 
Bi(A0 = B(N — 4) + Bi(N — 2) 

A little algebra leads to the following recursive relation: 

B(N) = B(N - 1) + B(N - 2) - B(N - 3) + B(N - 4) 

which, in turn, leads to the following polynomial equation: 

q A - q 3 - q 2 + q - 1 = 

The root, q rts 1.51, corresponding to the growth factor for the maps, is slightly higher than that of the 
ladder (w 1.44). We have not found a simple way to compute B(N) for general k values. 

V. SUMMARY 

Contact maps are a compact and useful representation of a protein's structure. Contact maps are used 
for screening candidate structures from a database. More recently attempts were made to use them to fold 
proteins, i.e. determine the map of a protein of known sequence by minimizing some energy function. 
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In order to have a handle on the work involved in searching the subspace of physical maps, it is important 
to know various statistics. For example, how the number of physical maps increases with the protein's length, 
the dependence of various properties on the number of contacts, etc. In this paper we studied these issues on 
several lattices; for an essentially one-dimensional ladder the results were obtained analytically and in two 
dimensions we studied the square and triangular lattices by exact enumeration and sampling. In addition 
we provide exact bounds on the number of distinct physical maps, valid in any dimension. 

Our main findings can be summarized as follows: 

• The number of physical contact maps scales exponentially with the length N of the walk. 

• The number of contact maps (and of walks as well) is a non-monotonic function of the number of 
contacts. 

• The average degeneracy of contact maps that have Nq contacts decreases as Nc increases. 

• Contact maps corresponding to very compact walks (i.e. highest Nc) have low degeneracy. The ground 
state of a protein is most likely to be found among these maps. 
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